Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements

نویسندگان

Gregory Marton

Alexey Radul

چکیده

The TREC Definition and Relationship questions are evaluated on the basis of information nuggets that may be contained in system responses. Human evaluators provide informal descriptions of each nugget, and judgements (assignments of nuggets to responses) for each response submitted by participants. While human evaluation is the most accurate way to compare systems, approximate automatic evaluation becomes critical during system development. We present Nuggeteer, a new automatic evaluation tool for nugget-based tasks. Like the first such tool, Pourpre, Nuggeteer uses words in common between candidate answer and answer key to approximate human judgements. Unlike Pourpre, but like human assessors, Nuggeteer creates a judgement for each candidatenugget pair, and can use existing judgements instead of guessing. This creates a more readily interpretable aggregate score, and allows developers to track individual nuggets through the variants of their system. Nuggeteer is quantitatively comparable in performance to Pourpre, and provides qualitatively better feedback to developers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Complex Interactive Question Answering with Wikipedia Anchor Text

When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document will often not be used in the most relevant information nugget in the document. In this paper, a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected automatically. Evaluated with the Nug...

متن کامل

Comparing Automatic Evaluation Measures for Image Description

Image description is a new natural language generation task, where the aim is to generate a human-like description of an image. The evaluation of computer-generated text is a notoriously difficult problem, however, the quality of image descriptions has typically been measured using unigram BLEU and human judgements. The focus of this paper is to determine the correlation of automatic measures w...

متن کامل

IIT Kharagpur at TAC 2009: Statistical and Nugget-based Model for Automatic Summary Evaluation

In this paper we present our participation at TAC 2009 AESOP (Automatically Evaluating Summaries Of Peers) task. We make use of a statistical model for evaluation metric correlating to Overall Responsiveness and a nugget-based pyramid model for correlating to the Pyramid manual metric of TAC 2009. We also present the performance of our three submitted runs as per the official TAC 2009 evaluatio...

متن کامل

Towards Succinct and Relevant Image Descriptions

What does it mean to produce a good description of an image? Is a description good because it correctly identifies all of the objects in the image, because it describes the interesting attributes of the objects, or because it is short, yet informative? Grice’s Cooperative Principle, stated as “Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose o...

متن کامل

Improvement of generative adversarial networks for automatic text-to-image generation

This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements

نویسندگان

چکیده

منابع مشابه

Improving Complex Interactive Question Answering with Wikipedia Anchor Text

Comparing Automatic Evaluation Measures for Image Description

IIT Kharagpur at TAC 2009: Statistical and Nugget-based Model for Automatic Summary Evaluation

Towards Succinct and Relevant Image Descriptions

Improvement of generative adversarial networks for automatic text-to-image generation

عنوان ژورنال:

اشتراک گذاری